Bootstrapping Named Entity Recognition with Automatically Generated Gazetteer Lists
نویسنده
چکیده
Current Named Entity Recognition systems suffer from the lack of hand-tagged data as well as degradation when moving to other domain. This paper explores two aspects: the automatic generation of gazetteer lists from unlabeled data; and the building of a Named Entity Recognition system with labeled and unlabeled data.
منابع مشابه
Gazetteer Preparation for Named Entity Recognition in Indian Languages
This paper describes our approaches for the preparation of gazetteers for named entity recognition (NER) in Indian languages. We have described two methodologies for the preparation of gazetteers1. Since the relevant gazetteer lists are more easily available in English we have used a transliteration based approach to convert available English name lists to Indian languages. The second approach ...
متن کاملN-gram and Gazetteer List Based Named Entity Recognition for Urdu: A Scarce Resourced Language
Extraction of named entities (NEs) from the text is an important operation in many natural language processing applications like information extraction, question answering, machine translation etc. Since early 1990s the researchers have taken greater interest in this field and a lot of work has been done regarding Named Entity Recognition (NER) in different languages of the world. Unfortunately...
متن کاملIdentifying Named Entities in Text Databases from the Natural History Domain
In this paper, we investigate whether it is possible to bootstrap a named entity tagger for textual databases by exploiting the database structure to automatically generate domain and database-specific gazetteer lists. We compare three tagging strategies: (i) using the extracted gazetteers in a look-up tagger, (ii) using the gazetteers to automatically extract training data to train a database-...
متن کاملNamed Entity Recognition in Hindi using Maximum Entropy and Transliteration
(NER) system becomes challenging if proper resources are not available. Gazetteer lists are often used for the development of NER systems. In many resource-poor languages gazetteer lists of proper size are not available, but sometimes relevant lists are available in English. Proper transliteration makes the English lists useful in the NER tasks for such languages. In this paper, we have describ...
متن کاملBootstrapping Multilingual Geographical Gazetteers from Corpora
In this paper an approach to automatically generating multilingual geographical name gazetteers via two bootstrapping loops on different corpora is presented. First, a small seed-list of geographical names is matched to an unannotated dataset in one language, and training data for a memory-based classifier is generated. Memory-based learning is applied to extend the gazetteer. Then a cross-over...
متن کامل